An Efficient in-Place 3D Transpose for Multicore Processors with Memory Managed Memory Hierarchy
نویسندگان
چکیده
3D transpose is an important operation in many large scale scientific applications such as seismic and medical imaging. This paper proposes a novel algorithm for fast in-place 3D transpose operation. The algorithm exploits SIMD multicore architecture with software managed memory hierarchy. Such architectural features are present in the next generation processors, such as the Cell BE processor. The algorithm performs transposition at two levels of granularity: at coarse level, where logical transposition is done by merely transposing the address map at each access; and at a fine grain level, where transposition is done at the physical level. Such mix combines the benefits of allowing for fast on-chip bandwidth by providing for large transfer sizes, and at the same time allows for fine-grain SIMD operations. The transfer rate is further enhanced by allowing for batch transposing spatially joined data along a major axis. Results on the Cell BE processing show substantial utilisation of on-chip communication bandwidth, and negligible processing time.
منابع مشابه
Enhancing Visual Rendering on Multicore Accelerators with Explicitly Managed Memories
Recent electronic devices are equipped with processors extended with multicore accelerators to take advantage of the powerful performance from acceleration co-processors. Applications on such high-end electronic products require capability to run graphic-rich applications. Scalable acceleration co-processors are frequently designed as multicores with explicitly managed memories. Such multicore ...
متن کاملA Portable 3D FFT Package for Distributed-Memory Parallel Architectures
1 I n t r o d u c t i o n Multidimensional FF’I’s are used frequently in engineerillg and scientific calculations, especially in image processing. Parallel implementations of FFT generally follow two approaches. One is the binary-exchange approach[l ,2], where data exchanges take place in all pairs of processors with processor numbers differing by one bit. Another one is the transpose approach[...
متن کاملShared Memory Abstractions for Heterogeneous Multicore Processors
We are now seeing diminishing returns from classic single-core processor designs, yet the number of transistors available for a processor is still increasing. Processor architects are therefore experimenting with a variety of multicore processor designs. Heterogeneous multicore processors with Explicitly Managed Memory (EMM) hierarchies are one such experimental design which has the potential f...
متن کاملMemory Hierarchy Issues in Multicore Architectures
Multicore architectures have introduced a new problem to parallel computing, namely, the management of hierarchical parallel caches. As with other architectures, a cache structure is designed to simulate a fast common memory. To address the challenge of management of these caches we a) introduce the Unified Multicore Model (UMM), a hierarchical arrangement of caches, b) present a general strate...
متن کاملEvaluating multicore algorithms on the unified memory model
One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008